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Abstract 


In this note, we present an information diffusion inequality derived from an elemenfary 
argument, which gives rise to a very general Fano-type inequality. The latter unifies and gen¬ 
eralizes the distance-based Fano inequality and the continuous Fano inequality established in 
IDW13I Corollary 1, Propositions 1 and 2], as well as the generalized Fano inequality in IHV941 
Equation following (10)]. 

1 Introduction 

Fano inequality is a crucial tool in information theory with numerous applications. Moreover, it 
has been heavily used in statistics in the context of minimax theory (see IILC98II and references 
contained therein) and more recently also in optimization (see e.g., IIRR091IABRW121 IBGP13I| ) to 
lower bound the rate of convergence of estimators and algorithms. The general setup of Fano 
inequalities is a Markov chain X —)■ Y —)■ X and we are interested in the probability of finding 
a sufficient reconstruction X of the hidden random variable X by observations Y. Classically the 
measure of sufficiency has been equality, i.e., we ask for perfect reconstructions X = X. This can 
be relaxed in several ways, by e.g., accepting reconstructions X, whenever X is close to X. 

In this note we present an elementary information diffusion inequality, which immediately 
gives rise to a very general Fano inequality, extending and subsuming the versions presented in 
IIDW13II . In particular, we allow for arbitrary relations R C range(X) x range(X) indicating a 
sufficient reconstruction. 

Our notation is standard as to be found in IICT06II . and consistent with IIDW13II . We denote 
random variables by capital bold letters such as, e.g., X and events by scripts letters, such as IR. Let 
-iIR denote the negation of the event 3?. 

Let log be a logarithm with an arbitrary basis a > 1, which also serves as a basis for measuring 
information, i.e., all information quantities are defined using base a logarithm log. Recall that the 
Renyi divergence of two distributions P and Q over the same probability space is defined as 



for an order 0 < a < oo with a 7^ 1. By continuity, this extends to orders 0,1 and 00. For the order 
a = 1 one recovers relative entropy, also known as Kullback-Leibler divergence: 
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When P and Q are Bernoulli distributions with parameters p and q respectively, we obtain the 
binary versions 


dec ip 

dip 


. log ip^c^ “ + (1 - p)“(l - qY “) 

■“ tt-1 

q) := plog^ + (l-p)log^^. 

H ^ H 


The binary Renyi entropy and binary entropy is defined as 

H. IpJ := 

M[p] :=plogi + (l-p)log 


1 

1 -p' 


2 Information diffusion Fano inequality 


In this section we will present a general information diffusion inequality, applicable to a broad 
range of distributions, including continuous ones. We allow for specification of an arbitrary recon¬ 
struction relation R C range(X) x range(X), where X is a random variable and X its reconstruction. 
We might want to think of R as specifying the acceptable reconstructions, e.g., those with small 
^ 1 -error. 

Our general Fano inequality is inspired by a simple support-based lower bound on relative 
entropy, see e.g., llvEH14l Theorem 3]: For any two probability distributions P, Q on the same 
probability space, and denoting in the support supp P of P: 


D(P||Q) >log 


1 

Pq [suppP]' 


The next inequality is an extension of the generalized Fano inequalities in IIDW131 Corollary 1 
and Proposition 2], where we do not consider the distance between Pxy and Px x Py but rather 
between two arbitrary distributions Pxy and Qxy- 


Proposition 2.1 (Information diffusion Fano inequality). Let P and Q be two probability distributions 
on the same probability space and IR an event. Further, choose 0 < pmin < 1 iind 0 < pmax < 1 w/f/z 
Pmin + Pmax < I to be numbers satisfying 


Pmin ^ P Q [^] ^ Pmax ■ 
Then for any order 0 < a < oo with a fl: 


( 1 ) 


Pp [3?] < “ 


exp 


D, (P II Q) + H, [Pp [3?]] + log(l - p^in) j ^ 


-1 


1 _\ ^ 1 

d pmin \ _ ^ 


( 2 ) 


For the order a = 1, the following version holds: 

D (P II Q) + H [Pp [3?]] + log(l - Pmir 


Pp [IR] < 


log 


1-Pn 


(3) 
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Proof. The proof is an easy application of the data processing equality. We shall also use the in¬ 
equality 

> (x +1/)“ if a < 1, 


x“ -hi/“ 


< (x -|- y)“ if a > 1, 
with the choice x := Pp [IR] and y := 1 — Pp [R]: 


x,y > 0 


Pp [R]“ + (i-Pp [R]y 


>1 if a < 1, 


^ < 1 if a > 1. 

One should verify the inequalities below separately for a < 1 and a > 1. 

(P II Q) + [Pp [31]] > d^ (Pp [R] II Pq [3?]) + [Pp [31]] 


log 


Fp[3J]“ + (l-Fp[3J])“ 


> 


a — 1 

WW>^-a^ + (l-FrW)ni-Pmin)^-‘^ 

Fp[3i]“+(l-Fp[3J])‘' 


log 


log < 


Fp[0?]“ 


Fp[K 


a — 1 

^ —Fmin A 

pmax J 

‘‘ + (l-Fp[K])“ 


a — 1 


> 


log Pp [3J]" 


^-n ■ 

pirun j _ ^ 

Pmax 


- log(l - pn 
+ i' 


(4) 


(data processing) 


(by Eq. Q) 


a — 1 


- log(l - pmin)- (byEq. dUl) 


The claim follows by rearranging. Eor the case a = 1 we provide two proofs: (1) by taking limit 
when a —> 1, and (2) via a similar direct argument. To simplify the limit argument, let us introduce 
some shorthand notation: 


Aa := Doc {P II Q) + Ha, [Pp [31]] log(l - Pmm), 

g 1 — Pmin 
Pmax 

Recall that lima;/i Da (P || Q) = D (P || Q), therefore Aj is the numerator of (|3]|. The limit of the 
right-hand side of Q as a ^ 1 


lim 

cc/'l 




exp 


A 

^ loge 


- 1 


ga-1 _ I 



exp 



ft— 1 


-1 


Aoi 

loge 


1 Ai 
A log^ log e 

\ logs ^ 


Ai 

logs' 


which is exactly the right-hand side of Eq. lO. 


3 


































An alternate proof via a direct computation goes as follows, similar to the proof of Eq. ©I 


D(P||Q)+H[Pp [3?]] >d(Pp [Jl] II Pq [3J])+H[Pp [3?]] 


(data processing) 


Rearranging finishes the proof. 


= Pp [K| log + (1 - P, [K]) log i M 


+ Pp [01] log 


Pp [01] 


(1-Pp [3?]) log 


= Pp [3?] log 


Pq [3?] 


+ (1-Pp [3?])log. 


-Pq 

1 

1 - Pp [01] 
1 


1-Pq [3J] 

> Pp [01] log + (1 - Pp [3?]) log —. 

Pmax ^ Pmin 


(by©) 

□ 


We obtain a very general version of Fano's inequality as a consequence. This general form does 
not require any specific distributional assumptions on X such as e.g., uniformity. The case pmin = 0 
is IIHV941 Equation following (10)]. 

Proposition 2.2 (Fano inequality for arbitrary relations). Let \ ^ \ ^ \ be a Markov chain of 
random variables and let R be any set of values {x,x) with x E range(X) and x E range(X). Further, 
choose 0 < Pmin < 1, 0 < pmax < 1 wzf/i pmin + Pmax < 1 to be numbers satisfying 


Pmin < inf P [(X, x) E R] and Pmax > sup P [(X, x) ER]. 

r ^ 


Let 3i denote the event (X, X) G R. Then 


P [01] < 


X;X 


-H[P [3J]]+log(l-pmin) 


< 


log 


I[X;Y]+H[P [3^]]+log(l-p„ 

l0gl:iPmin 

Umav 


(5) 


Proof The second inequality is equivalent to the data processing inequality I X;X < I [X; Yj. 
The first inequality is the following special case of Proposition 12.11 We choose P to be the joint 
distribution of (X, X), which is the distribution used in the statement, i.e., P [3i] = Pp [3^]. We 


choose Q to be the product of the marginal distributions of X and X, therefore D (P || Q) = I 
Finally, 


X;X 


Pq [3?] = Pq (X,X) ER = [P [(X,x) G Kj] > infP [(X,3c) G R] > p„ 


and similarly, Pq [3i] < pmax- Therefore the conditions of Proposition |2T] are satisfied, and its 
conclusion provides the first inequality in ©. □ 

We immediately obtain the following corollary by rearranging ©. The condition pmin + Pmax < 
1 is no longer needed, as it was only used to preserve the direction of inequality while dividing by 
log[(l — Pmin)/Pmax]- This Step Can be omitted by a direct proof, consisting of repeating the last 
computation in the proof of Proposition l2.ll and then rearranging. 
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Corollary 2.3 (Entropy version of Fano inequality). Let X —^ Y —)■ X a Markov chain of random 
variables and let R be any set of values {x,x) with x G range(X) and x G range(X). With notation from 
Proposition I2.2l zee have 


H 


X 


< H [X] + log p^nax + H [P [-3?]] + P log 


1 Pmin 
Pmax 


Moreover, if Y = (Yi ,... ,\n) is obtained via independent sampling from a hidden distribu¬ 
tion specified by X, i.e., the Yj,..., Y„ | X are i.i.d, then we obtain the following corollary, which 
is sufficient for many applications. The version with the relative entropy is obtained as a direct 
consequence of the convexity of the relative entropy. 

Corollary 2.4 (Fano inequality for independent samples). Let \ ^ Y ^ \ be a Markov chain of 
random variables with Y = (Yi,..., Y„), so that Yi,..., Y„ | Xare i.i.d. Further, let R be any set of values 
{x,x) with X G range(X) and x G range(X). With notation from Proposition l2.2h oe have 


P [-.3^] < 


n • I [X; Yi] + H [P [3?]] + log(l - p 


log 


1 Pmir 

Pmax 


mm; ^ n • /3 + IT [P [3J]] + log(l — Pmin) 


log 


1 Pmir 

Pmax 


where f = max^ .|./gj.aj.jge(x) C^il^ = ^ || Yi|X = x'). 


2.1 Special cases 

We will now show how to obtain IIDW131 Corollary 1, Propositions 1 and 2] as special cases of the 
general Fano inequality from above by choosing the relation R accordingly. 


Distance-based Fano inequality 

For the distance-based case, let p : range(X) x range(X) ^ R be a symmetric function—typically 
a metric. Fet X be a discrete random variable with 2 < |range(X)| < oo. Furthermore let X 
denote the reconstruction and assume range(X) = range(X). For a given radius t denote Pt := 
P p(X,X) > n . We then obtain as corollary, in the case where X is uniform: 

Corollary 2.5. (Distance-based Fano inequality HDWlSi Proposition 1 ]) Let X —^ Y 
chain of random variables with X uniform. For a given radius t > 0 define 


Xbe a Markov 


N^max •_ |{£ I p{x,x) < t}\ 


and 


||£ I p(x,x) < f}| , 

X 


then 


H[Pf]+Pflog 


|range(X)| - Nj 


min 


log Nf 


> H 


X 


Proof. We pick R := {{x,x) G range(X) x range(X) | p{x,x) < f}, so that P [mlk] = Pf, and choose 

-j\^max 

|range(X)|- 


fsfmin -^max 

Emin := |mnge(x)| ^^id pmax := |mnLfx)| - By Corollary |2j using H [X] < log |range(X)| 


H 


< H[X] +log- 


jy^max 


I range (X) I 

< log|range(X)| +log 


H[Pf] + Ptlog 


2 ymax 


|range(X)| 


+ H[Pf]+Paog 


j^min 

^ |range(X)| 

2^max 

|range(X)| 

I range (X) I -N, 


mm 


Nf 


= logNfa" + H[Pf]+Ptlog 


|range(X)| - Nj 

jymax 


mm 
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as claimed. 


□ 


Note that we require X to be uniform in Corollary l2.5l to easily match the form of IIDW131 Propo¬ 
sition 1], However, the uniformity requirement can be removed. With the same choice for R, we 
also immediately obtain IIDW131 Corollary 1], either by following the approach in IIDW13II or by 
directly invoking Proposition l2.21 

Corollary 2.6. (Mutual information version of distance-based Fano inequality ilDW13[ Proposition 2]) With 
the notation of Corollary \2.5\ letX ^ Y ^ Xbe a Markov chain of random variables with X uniform. For 
any radius t > Ozoe have 

I[>CY]+H^ 

^ “ loe 


Continuous Fano inequality 

In a next step, we will show how to obtain the continuous Fano inequality of IIDW13I . avoiding the 
discretization argument altogether. Our version is slightly more general. 

Let X be a continues random variable so that that range(X) has finite non-zero Lebesgue mea¬ 
sure. Moreover, let range(X) = range(X) as in the discrete distance-based setup. With the notation 
from above, we define lBp{t,x) := {x G range(X) | p{x,x) < t}. We obtain 

Corollary 2.7. (Continuos Fano inequality hDW13[ Proposition 2]) Let X ^ Y ^ Xbe a Markov chain of 
random variables with X uniform. For a given radius t > Owe have 

^ I[X;Y]+log2 

^ — 1 vol(range(X)) 

sup,. vol(Bp (t,x) nrange(X)) 

Proof. As before, we choose R := {{x,x) G range(X) x range(X) | p{x,x) < t}, so that Pp [3i] = 

1 - Pf. We apply Proposition |22] with the choice pmin = 0 and pmax = ^ and 

obtain _ 

, p n[X;V]+H[P,] 

^ — 1 vol(range(X)) ' 

sup,vol(Bp(t,x)nrange(X)) 

which is the claim rearranged. □ 
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